Truecluster matching Truecluster matching
نویسنده
چکیده
Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a χ-transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.
منابع مشابه
Truecluster matching
Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the cr...
متن کاملTruecluster: scalable statistical clustering with model selection
Data based classification is fundamental to most branches of science. Despite of progress in statistical computing and predictive modelling, cluster analysis until today lacks model selection robustness and scalability to large datasets. We consider the important problem of deciding about the optimal number of clusters given an arbitrary definition of space and clusteriness. We show how to cons...
متن کاملTruecluster: robust scalable clustering with model selection
Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definitio...
متن کاملMatching Integral Graphs of Small Order
In this paper, we study matching integral graphs of small order. A graph is called matching integral if the zeros of its matching polynomial are all integers. Matching integral graphs were first studied by Akbari, Khalashi, etc. They characterized all traceable graphs which are matching integral. They studied matching integral regular graphs. Furthermore, it has been shown that there is no matc...
متن کاملFast Least Square Matching
Least square matching (LSM) is one of the most accurate image matching methods in photogrammetry and remote sensing. The main disadvantage of the LSM is its high computational complexity due to large size of observation equations. To address this problem, in this paper a novel method, called fast least square matching (FLSM) is being presented. The main idea of the proposed FLSM is decreasing t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007